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Abstract 

This paper treats the problem of screening for variables with high correlations in 
^ , high dimensional data in which there can be many fewer samples than variables. We 

I focus on threshold-based correlation screening methods for three related applications: 

■ screening for variables with large correlations within a single treatment (autocorre- 

lation screening); screening for variables with large cross-correlations over two treat- 
fSj I ments (cross-correlation screening); screening for variables that have persistently large 

' auto-correlations over two treatments (persistent-correlation screening) . The novelty of 

correlation screening is that it identifies a smaller number of variables which are highly 
correlated with others, as compared to identifying a number of correlation parameters. 
Correlation screening suffers from a phase transition phenomenon: as the correlation 
^ ■ threshold decreases the number of discoveries increases abruptly. We obtain asymptotic 

, expressions for the mean number of discoveries and the phase transition thresholds as 

a function of the number of samples, the number of variables, and the joint sample 
distribution. We also show that under a weak dependency condition the number of 
discoveries is dominated by a Poisson random variable giving an asymptotic expression 
for the false positive rate. The correlation screening approach bears tremendous divi- 
dends in terms of the type and strength of the asymptotic results that can be obtained. 
It also overcomes some of the major hurdles faced by existing methods in the litera- 
ture as correlation screening is naturally scalable to high dimension. Numerical results 
strongly validate the theory that is presented in this paper. We illustrate the applica- 
tion of the correlation screening methodology on a large scale gene-expression dataset, 
revealing a few influential variables that exhibit a significant amount of correlation over 
multiple treatments. 

Keywords: High dimensional inference. Variable selection. Phase transition, Poisson 
limit, Renyi entropy, Thresholding, Sparsity, False discovery. 



1 



1 Introduction 



Consider the problem of screening for variables that have significant correlations in a large 
data set. Examples of such data sets are gene expression arrays, multimedia databases, 
multivariate financial time series, and traffic over the Internet. Correlation screening can be 
used to discover a small number of variables that are highly correlated or whose correlations 
have distinct patterns, or motifs, that are not likely to occur by chance. Indeed, filtering 
out all but the highest sample correlations may be the only practical way to examine depen- 
dencies in massive datasets where computational limitations prevent the experimenter from 
evaluating all sample correlations. As an example, in multi-chip gene expression data the 
number of pairwise correlations can be in the billions. 

Thresholding the sample correlation matrix is an attractive screening method due to its 
simplicity. However, the threshold must be chosen with care due to the existence of an abrupt 
phase transition phenomenon controlling the number of discoveries. When the correlation 
threshold falls below a critical point the number of discoveries abruptly and rapidly increases, 
even when the variables are uncorrelated. This critical point can be close to one when the 
number p of variables greatly exceeds the number n of samples. Therefore a poorly selected 
correlation threshold may result in an overwhelmingly large number of discoveries. This 
paper provides theory that predicts the location of this critical point as a function of n, p, 
and the joint distribution of the variables. When the population covariance matrix is of large 
dimension and sparse the theory specifies universal thresholds that do not depend on the 
unknown multivariate sample density. 

We distinguish between three types of screening which arise in practical applications in- 
volving a single treatment or a pair of treatments. Each type of screening seeks to discover 
variables with the property that they are highly correlated with at least one other variable. 
The first application involves screening for variables that are highly correlated with other 
variables in undergoing the same treatment. The second application is screening for variables 
in one treatment that are highly correlated with variables undergoing a different treatment. 
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The third apphcation is screening for variables with high within-treatment correlation that 
persists over a pair of treatments. Precise definitions are given in Section 3. We respec- 
tively call these three applications auto-correlation screening, cross-correlation screening, 
and persistent-correlation screening. In each of these problems the location of the phase 
transition critical point is different. 

For each of these three applications we index the correlation threshold Pp by the number 
of variables p. We give asymptotic conditions on the sequence {pp\p that guarantee a finite 
and non-zero mean number of discoveries. These conditions, which depend on the number 
n of samples, can be used to guide the selection of an appropriate correlation threshold 
in practical applications. Under these conditions we derive asymptotic expressions for the 
mean number of discoveries. These expressions depend on a Bhattacharyya measure [3] of 
average pairwise dependency of the p multivariate U-scores defined on the (n— 2)-dimensional 
hypersphere. It is through this pairwise dependency measure that the population covariance 
matrix influences the mean number of discoveries. 

We establish simple achievable bounds that give insight into factors that determine the 
mean number of discoveries. These bounds involve Renyi entropy [18] and other information 
theoretic quantities. For example, we show that the mean discovery rate is proportional to 
the order 2 Renyi entropy of the average marginal density of associated U-scores if and only 
if these scores are independent identically distributed. Under this i.i.d. condition the mean 
number of auto-correlation screening discoveries is minimized for the case of uniformly dis- 
tributed U-scores. This establishes a minimal property of the p-variate spherical distribution 
over the elliptical diagonal dispersion family. 

Using the expressions for the mean number of discoveries we specify the critical point pc of 
the phase transition. As either p increases or n decreases pc approaches one, making reliable 
screening impossible, and Pc approaches this limit with rate roughly equal to In partic- 

ular, for auto-correlation screening, when n > 4 and p is large: pc = \/l — Cn{p — 
where depends on the aforementioned Bhattacharyya measure of average pairwise depen- 
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dency of the U-scores and only depends weakly on n. 

We also establish that under a weak dependency assumption the number of discoveries 
is asymptotically dominated by a related Poisson random variable. In the case of auto- 
correlation and cross-correlation screening this Poisson variable is the number of positive 
vertex degrees in the associated sample correlation graph. In the case of persistent-correlation 
screening the dominating Poisson variable is the correlation of the vertex degrees in the 
sample correlation graphs associated with each treatment. The weak dependency condition 
on the average U-score pairwise distributions is satisfied for variables whose covariance matrix 
is sparse or whose correlations are small. 

These dominance results specify an asymptotic expression for the false positive rate of 
discoveries that can be used to select the screening threshold to control the familjrwise dis- 
covery rate. Familywise discovery rate has been widely used in variable selection problems. 
The rate function in our derived Poisson limit specifies the marginal false discovery rate asso- 
ciated with a particular correlation threshold. While we do not explore it in this paper, when 
suitably corrected for dependency, the associated p-values might also be used to control the 
conditional false discovery rate. For a given pair of variables and a given screening threshold, 
the bias-corrected normal approximation to the Fisher Z transformed sample correlations al- 
lows us to approximate the minimum detectable correlation between the variables. We give a 
numerical example that provides experimental validation and illustrates the practical utility 
of our theoretical predictions for large but finite p and small n. We then apply our method 
to correlation screening of a large scale Affymetrix gene micro-array dataset for analysis of 
a four treatment beverage intake experiment [4]. 

The correlation screening problem treated here is not related to inverse covariance and 
covariance selection problems studied by many authors (see [6, 13, 19, 9, 8, 17] to name just 
a few from an increasing literature). Unlike these authors who are interested in correlation 
or covariance matrix estimation with respect to a matrix error norm, here we are concerned 
with detection of a few variables with large correlation coefficients. Unlike previous work 
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in covariance selection we provide precise phase transition thresholds that are applicable 
to large scale screening for correlation and persistence in single and multiple treatments. 
This paper is related to tests of significance for covariance and correlation matrices [11, 7], 
but our focus is correlation screening instead of testing for diagonal covariance or for other 
structure. Tests of diagonal covariance structure are often based on the maximum sample 
correlation coefficient, which has recently been studied in the large p regime [12, 14, 15, 16, 
21]. Unlike the correlation screening results shown in this paper, these studies often impose 
more stringent (Gaussian) assumptions on the joint distribution of the variables and do not 
consider the case of persistent maximal correlation. On the other hand, our results could be 
of practical value in both covariance selection and correlation tests of significance, especially 
when p is large. 

Correlation screening is an effective method for discovering a few highly correlated vari- 
ables when there are no response variables in the data, i.e., it is an unsupervised method. 
While our formulation of correlation screening does not specifically target the supervised 
problem of variable selection for regression, the correlation screening framework can be ap- 
plied to this setting. Specifically, the experimenter would apply correlation screening to a 
sample of concatenated vectors containing both independent variables and response vari- 
ables. Any independent variable discoveries that have high cross-correlation with a response 
variable would be excellent candidates to include in the regression algorithm. 

The outline of the paper is as follows. In Section 2 the main assumptions are stated 
and the mathematical notation is given. In Section 3 the different kinds of correlation 
screening tests are defined and the asymptotic theory is developed and discussed. In section 
4 the asymptotic theory is specialized to the case of block-sparse population covariance. In 
Section 5 numerical results and experiments are presented to illustrate the theory. Proofs 
of the principal results in the paper are given in the Appendix/Supplemental Section. We 
also refer the reader to a technical report which contains more details on the results in this 
paper (see [10]). 
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2 Preliminaries 



In this section we set the notation and recall some classical results on sample correlation. 
See Anderson [1], for example, for more background. 

Let X = [Xi, . . . , Xp]"^ be a vector of random variables with mean /x and pxp covariance 
matrix S. Define the correlation matrix T — D^^'^^SD^^''^ where = diagj(Sjj) is the 
diagonal matrix of variances of components of X. Assume that n samples of X are available 
and arrange these samples in a n x p data matrix 

^ = ■ ■ ■ > ^p] = [-^(1)' " ' ' -^hI^' 

where Xj = [Xu, . . . , Xni]"^ and X(j) = [Xii,...,Xip] denote the i-th column and row, 
respectively, of X. Note that most of the results in this paper hold when the rows of X are 
dependent. 

Define the sample mean of the i-th column X^ = X]j=i -^i'h the vector of sample 
means X = [Xi , . . . , Xp] , the pxp sample covariance matrix S = SiLi (-^(j) ~ ~ 
X), and the pxp sample correlation matrix R = Ds''^''^SDg^''^, where Ds = diagj(Sii) is 
the diagonal matrix of component sample variances. Let the ij-th entry of the ensemble 
covariance F be denoted jij and the ij-th entry of the sample covariance R be r^j. 

The multivariate Z-scores Zj e R" are constructed by standardizing the columns Xj of 
X to have sample mean equal to zero and sample variance equal to one 

X, -xa . ^ 

Zj — — =, z — 1, . . . ,p, 

^/Sii{n - 1) 

where 1 is a vector of ones. Equivalently, Z = [Zi, . . . , Zp] = {n-l)-^/^{I-n-^ll'^)X'D-^/^. 
The Z-scores lie on the intersection of the n—1 dimensional hyperplane {u e : l^u = 0} 
and the n — 1 dimensional sphere {u e R" : ||u||2 = 1}. The correlation matrix has the 
Z-score representation R = Z^Z. 
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An equivalent representation for the sample correlation matrix R uses what we call the 
U-scores, G R""^: 

R = U^U, (2.1) 

where U = [Ui, . . . , Up] is (n — 1) x p. The U-scores he on the (n — 2)-sphere Sn-2 in 
R'^"^ and are constructed by projecting away the components of the Xj's orthogonal to the 
n — 1 dimensional hyperplane {u e R" : l^u = 0}, i — Specifically, define 

the orthogonal n x n matrix H — [n~^^^l,ll2:n]- The matrix H2:„ can be obtained by 
Gramm-Schmidt orthogonalization and satisfies the properties 

1^H2:„ = [0, . . . , 0], H2:„^H2:„ = 

The U-scorc matrix U = [Ui, . . . , Up] is obtained from Z by the following relation 

U = H2..JZ. (2.2) 

Furthermore, the sample correlation between Xj and Xj can be computed using the inner 
product or the Euclidean distance between associated U-scores 

r,, = UfU, = l-&^^. (2.3) 

As the U-score is an (n — l)-elemcnt vector it is a more compact representation of the 
sample correlation than the n-element Z-score vector. More importantly, the U-score lives 
in a geometry, the {n — 2)-sphere of co-dimension 1 shown in Fig. 1, that is simpler than 
that of the standard Z-score. 
EUiptically contoured distributions 

The results in this paper hold for a wide class of sample distributions that include light 



7 



and heavy tailed distributions such as the multivariate normal and multivariate student-t, 
respectively. A random vector X is said to follow an elliptical distribution with location 
parameter fi and dispersion matrix parameter S if its density has the form 



/x(x) = |E|-V2^((x-/.fE-^(x-/i)), 



(2.4) 



where g{u) is a non- negative monotonic function. When S is a diagonal matrix the elliptical 
distribution is called diagonal elliptical. It is well known that when the rows of the data 
matrix X are i.i.d. and follow a diagonal elliptical distribution the U-scores are uniformly 
distributed on Sn-2, see for example [Sec. 2.7] [1]. In the case of non-diagonal S the distri- 
bution of the U-scores over the sphere Sn-2 will generally be far from uniform (Fig. 1). The 
U-score representations (2.1) and (2.3) of the sample correlation will be a key ingredient for 
deriving the asymptotic results in this paper. 

Invoked in the sequel will be the following sparsity condition on the dispersion matrix. 
The matrix S = is said to be row-sparse of degree k if every row has fewer than 

k + 1 non-zero entries. Formally, 



where is the empty set. When the matrix is row-sparse of degree q and there exists a 
permutation that block diagonalizes S then the matrix satisfies the q-spaxse condition of 



Relevant definitions: The asymptotic expressions for the mean number of discoveries in 
the next section will be a function of several quantities introduced below. 

Spherical Cap Probability 



{i:\{j:aij^O}\>k}^$, 



(2.5) 



Sec. 4. 
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Figure 1: The U-scores associated with n = 4 reahzations of 500 variables are n — 1-element 
vectors that he on the unit n—2 dimensional sphere Sn-2- Shown are U-scores for a multivari- 
ate normal sample. At left: for diagonal covariance matrix the 500 U-scores are uniformly 
distributed over Sn-2- At right: for a non-diagonal covariance the U-scores are far from 
uniformly distributed on Sn-2- Pairs of U-scores that are close to each other, as measured 
by Euclidean distance, have high associated sample correlations. 



Define 



Po = Po{p, n) = a.n f (1 - u^) du, (2.6) 
J p 



where a„ is 



2T((n - l)/2) p^^j 



V?r((n-2)/2)- 



The quantity Po/2 is equal to the proportional area of the spherical cap of radius r = 
\/2(l — p) on Sn-2- It is the probability that a uniformly distributed point U on the sphere 
lies in pair of hyperspherical cones symmetric about the origin. This probability expression 
was derived in the context of the spherical normal distribution by Ruben [20, Eq. 4.1]. ^ A 
power series expansion of the integral in (2.6) yields the relation, accurate as approaches 



^The integral in [20, Eq. 4.1] is obtained from the integral in (2.6) by making change of variable 9 
arccos(M). 



9 



1: 



Po(p, n) = {n- 2)-\r^{l - p't-'^'\l + 0(1 - p")). (2.8) 

Relevant entropy and divergence quantities 

For a given density / on Sn-2 define the following entropy-related functional, which 
satisfies the indicated inequality 

H2{f) = \Sn-2\ I f\n)dn > 1. (2.9) 

J Sn-2 

Equality is attained in the inequality (2.9) if and only if (iff) / is the uniform density: 
/(u) = |5'„_2|~^. H2{f) is a monotonic transformation of the Renyi entropy of / of order 2: 

-\0g{\Sn-2\-'HM). 

For a joint density /u,v on Sn-2 x Sn-2 with marginals /u and /v define 

J{h,v)^\Sn-2\ [ /u,v(u,u)du. (2.10) 

Sn-2 

It will be shown that J(/u,v) influences the mean number of discoveries. Therefore, several 
intuitive interpretations are given below that will be of use in the sequel. 

First, J(/u,v) is a measure of dependence between U, V. Specifically, it is equal to the 
Bhattacharyya affinity between /v(w)/u(w) and the product /u|v(w|w)/v|u(w|w): 

J{fv,v) = \Sn-2\ J ^/u|v(w|w)/v|u(w|w)A//u(w)/v(w)ciw. (2.11) 

This is maximized when U, V are statistically independent. 

Second, the following asymptotic representation follows from (A. 16): 

p(min{||U- VIb, llU + Vlh} < ^2(1 - p)) 
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The limit is equal to one when U and V are independent and uniformly distributed on 
Sn-2- Thus J(/u,v) — 1 is a measure of the deviation of the joint density from uniform 
/u,v(u, v) = |5'„_2p. This measure can either be positive, e.g., when U and V are highly 
correlated or anti-correlated, or negative, e.g., when /u,v(u, v) has nearly zero mass in the 
vicinity of the diagonal u — v = and antidiagonal u + v = regions. 

Finally, the following simple inequalities give further insight into J(/u,v)- 

J(fv,v) < \Sn-2\ (^J /u|v(w|w)/v|u(w|w)dw^ (^j /u(w)/v(w)dw^ 

< ^2^'(/u|v)i^2^'(/v|u)i^2^'(/u)i^2^'(/v), (2.12) 

where equality in the first inequality and the second inequality occur iff /u v(u, u) = 
/u(u)/v(u) and /u(u) = /v(u), respectively. Hence J(/u,v) is maximized when U and 
V are independent. In the other direction, when restricted to the case of independent U 
and V, J(/u,v) = H^"^ {fu)!!^'^ {Jy) is minimized when U and V are uniform over Sn—2- 



3 Correlation screening 

Consider an experiment to compare p variables under treatments a and 6, called X" and X''. 
The number n of sample realizations may be different in the two experiments but the number 
and identity of the p variables are the same. These experiments produce two data matrices: 
X" and X**, which are Ua x p and rib x p matrices, respectively. Prom these data matrices 
extract the U-score matrices U" and U''. Then, using the representation (2.1), we construct 
R" = [1U"]"'"1U" and R'' = [U'']^U'', and call them sample auto-correlation matrices. When 
ria — rib we can also construct the sample cross-correlation matrix R"*" = [U"]^U''. We are 
primarily interested in the case ria^rib ^ p so that the auto-correlation and cross-correlation 
matrices will be rank deficient. Let the ij-th element of each of these matrices be denoted 
as r"^-, and r^j", respectively. 

We distinguish between three types of correlation screening. We use the terms auto- 
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correlation and cross-correlation in analogy to auto-correlation and cross-correlation func- 
tions in time series analysis. 

Auto-correlation screening: The objective is to screen the p variables for those whose 
maximal magnitude correlation exceeds a given threshold pa- Specifically, for i,j = 1, . . . ,p, 
the i-th variable passes the screen if: 

max|r".|>p„. (3.1) 

Cross-correlation screening: The objective is to screen the p variables for those whose 
maximal magnitude cross-correlation exceeds a given threshold pab- Specifically, for i,j ~ 
1, . . . ,p, the i-th variable passes the screen if: 

maxlr^^l >p„6. (3.2) 

Persistent auto-correlation screening: The objective is to screen the p variables for those 
whose maximal magnitude auto-correlation in both treatments exceeds given thresholds Pa 

and Phi respectively. Specifically, for i = 1, . . . ,p, the i-th variable passes the screen if: 

maxlr" ! > pa and max|r^ | > p^. (3.3) 

For each of the above three tests a discovery is declared if an index i passes the screen 
and we denote by A'"", ^ and A^"^^, respectively, the total number of discoveries. For large 
p, these three tests display similar phase transition phenomena. For example, we illustrate in 
Fig. 2 how the number N"' of false auto-correlation discoveries experiences a sharp increase 
as the threshold pa is reduced beyond a certain critical value pc- This critical value depends 
on the number p of variables, the number n — UaOi samples, and the joint distribution of the 
p variables. The behavior gets worse as n decreases relative to p, eventually overwhelming 
the test with false discoveries for all but a narrow range of thresholds p close to 1. 
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n=50, p=500, rho^=0.8 n=25, p=500, rho^=0.8 n=10, p=500, rho^=0.8 




Eample correlation value Sample correlation value Sample correlation value 

Figure 2: Effect of number of samples n on the discoveries for a multivariate normal sample 
where all but two of the p = 500 variables are mutually correlated as n decreases over the 
range 50, 25, 10. These two variables have a correlation coefficient equal to pi = 0.8. Shown 
are histograms of the p{p — l)/2 distinct sample correlation coefficients in the correlation 
matrix R excluding the diagonal coefficients. The arrows point to the locations of the 
positive and negative correlation thresholds of an auto-correlation screening test that would 
detect the variables having at least 0.8 correlation with probability not exceeding 0.5. An 
increasing number of other sample correlations exceed this threshold as n decreases: these 
false discoveries are overwhelming for small n. 

In the next three subsections we develop theory to predict this phase transition behavior 
in terms of the mean number of discoveries. 

3.1 Discoveries in auto-correlation screening 

Here we give results for the mean number of discoveries E[N"-] when screening for threshold- 
exceeding correlations between columns of a single data matrix X'*. For convenience here we 
suppress "a" superscripts and subscripts. 
We recall the quantities 

7p = max {a^Mfcii}, 7]p = 2a^Mi,i (3.4) 
i<fc<p 

where a„ = |5'„_2|, Mk\i is defined in (A. 2) and Mi\i is defined in (A. 4). These quantities 
are uniformly bounded over p when the joint density /ui,....Up of the U-scores is smooth and 
strictly bounded between (0, oo). For example if the joint density of the Z-scores is a finite 
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mixture of von Mises- Fisher densities on the sphere 5'„_2 with strictly bounded concentration 
parameters, then 7^ and rjp are uniformly bounded. 

Proposition 1. Let the n x p data matrix X have associated U-scores U and assume that 
n > 2. Assume that ■jp and r]p are uniformly bounded. Let the sequence {pp}p of correlation 
thresholds be such that pp — >■ 1 and p(p — 1) (l — Pp)*^" ^^^^ — >■ for some finite constant 
e„. Then the mean number of discoveries generated from the auto- correlation screen (3.1) 
satisfies: 

\E[N] - Kr,J{h;~^,)\ < 0(p-') + 0(,/rr^p), (3.5) 

where Kn = a^Cn/ (n — 2) and 

^) = ^ E ^ E (l/u.u,(u, v) + i/u.u,(u, -v)) , (3.6) 

is the average of the pairwise U-score density. Assume in addition that the joint density of the 
U-scores satisfies the weak dependency condition: for some k — o{p) the average dependency 
coefficient ||Ap^fe||i (A. 13) converges to zero. Then P{N > 0) — >■ 1 — exp(— A/2) where A is 
the limiting value of E[N] specified by (3.5). 

In the proof of Prop. 1 we establish the stated limit on P{N > 0) by showing that N is 
dominated by the number A^e of edges in the correlation graph and that A'e converges to a 
Poisson random variable N* with rate A/2 as p — > 00. The rate of convergence oi P{N > 0) 
to the stated limit is of order max{(/c/p)^, ||Ap^fc||i}. 

In terms of the limiting value (3.5) of E[N] the case where the columns of X have 
spherically contoured distribution is of special interest. In this case the U-scores are i.i.d. 
uniformly distributed and J(/u.,u,_.) — 1- Prop. 1 asserts the weaker necessary and 
sufficient condition: J(/u,,u,_,) = 1 if and only if the averaged pairwise U-score density 
(3.6) is i.i.d. uniform over Sn-2 x We develop this further in the next paragraph. 
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n 


550 


500 


450 


150 


100 


50 


10 


8 


6 


Pc 


0.188 


0.197 


0.207 


0.344 


0.413 


0.559 


0.961 


0.988 


0.9997 



Table 1: Values of the critical threshold where phase transition occurs in Fig. 3. These 
values were determined using asymptotic approximation (3.11). 

First observe that the marginal densities, obtained by integrating /u,,u*_,(u, v) over v 
and u, are identical and equal to the average U-score density 

= I E (^/u.(u) + i/u.(-u)) . (3.7) 

i=l 

Therefore inequality (2.12) implies that 

^(a;::^) < Hi'\h\y)Hi'\fyp)Hi'\h;:), (3.8) 

with equality iff /u,,u*_.(u, u) — (/u*(u))^, which satisfied when the U-scores are indepen- 
dent. Second observe that the extremal property (2.9) of i?2(/) imphes that, among all such 
i.i.d. U-score distributions, E[N] will be smallest when the marginal /u. is uniform, which 
is satisfied when the U-scores are uniform on Sn-2- 

In the case that /u«,u«_,(u, u) = |5'„_2|~^, (3.5) implies the asymptotic approximation 
for finite p and p < 1: 

E[N] ^ Kn-p{p-l)Po{p,n), (3.9) 

since p{p — l)Po{pp, n) — > as p — >■ oo. This case holds, for example, when the rows of X 
are i.i.d. with diagonal elliptical distribution. In this case the U-scores are i.i.d. uniform 
and the mean number of discoveries has the exact expression 

E[N]=p{l-{l-Po{p,n)r-'). (3.10) 
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P 




0.2 0.4 0.6 0.8 1.0 



Figure 3: Normalized mean number of discoveries E[N]/p for the case that the rows 
of the data matrix are normal with diagonal covariance. Nine curves are plotted as a 
function of the screening threshold p for p = 500 and nine values of n. The values 
n = 550, 500, 450, 150, 100, 50, 10, 8, 6 index the curves from left to right. 

In Fig. 3 we plot the exact expression (3.10) for the normalized mean number of dis- 
coveries as a function of p and n for p = 500. Each curve, decreasing monotonically as p 
increases, is a plot of E[N]/p for given n. Since the true covariance matrix is diagonal all 
discoveries are false discoveries. We make several observations: 

• The curves in Fig. 3 cluster into three groups. From left to right: n G {550,500,450}, 
n G {150, 100,50} and n G {10,8,6}. The effect on the curves of varying n is more 
pronounced for small n than for larger n. 

• The curves illustrate a phase transition phenomenon in the mean number of false 
positives as a function of the threshold p. For given n there is a critical point pc such 
that as p approaches pc from above the mean number of false positives is small and 
increases very slowly. As p continuous to decrease in the vicinity of pc the mean number 
of false positives increases rapidly to p. 

• The rapidity of the phase transition varies as a function of n and is related to the slope 
of the curve near its inflection point. The most rapid phase transitions occur when n 
is very large or very small. 

The phase transition threshold value pc can be predicted by the knee of the curve in Fig. 
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3, defined as the maximum value p at which the slope of the curve equals minus one. This 
choice of critical slope is common in the physics literature. One could choose a different 
critical slope value to define pc but this would only have a minor effect (a change in the 
quantity in (3.10) by a constant scale factor). The slope of the large p approximation 
(3.5) to E[N] is 

dE[N]/dp = -pip - 1)(1 - p')^--'^l'^.nJ{h^,), 

where a„ is given in (2.7). Define the critical value as pc — max{p : p~^dE[N]/dp = — 1}. 
For n > 4 this is maximization can be solved to give the expression 

Pe = ^l-C„(p-l)-2/{n-4), (3.11) 

where = (an</(/u.,u*_.)) The accuracy of pc defined in (3.11) can be appreciated 

by comparing the predicted pc in Table 1 to the transition points of the associated curves in 
Fig. 3. 



3.2 Discoveries in cross-correlation screening 

Next we turn to screening for threshold-exceeding cross-correlations between columns of two 
data matrices X" and X''. The theory in the previous section could be directly used by 
applying Prop. 1 to the concatenated n x 2p data matrix 



X 



X" x** 



However, the convergence rates and phase transition thresholds would be significantly worse 
than before due to the infiation of the number of variables from p to 2p. Furthermore, if we 
thresholded the entire 2p x 2p sample correlation matrix X^X we would expect that in most 
practical problems the auto-correlation discoveries in the diagonal blocks would dominate 
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the cross-correlation discoveries in the off-diagonal blocks. The following result is useful 
when one is only interested in the cross-correlation discoveries. 

Define and rf^^ similarly to (3.4) except that M^i and Mi|i are replaced by M^^^ and 
M^^-^ as defined in (A.5) and (A.6). 

Proposition 2. Let the n x p data matrices X" and X** have associated U-scores and 
and assume that n > 2. Assume that and r]^ are uniformly hounded. Let the sequence 
{Pp}p of cross-correlation thresholds he such that Pp ^ 1 andp^ (l — p^) ^''^^ Cn for some 
finite constant e„. Then the mean numher of discoveries generated from the cross- correlation 
screen (3.2) satisfies: 

\E[N'^'] - Kr,J{h~.)\ < 0{p-') + (3.12) 

where Kn — ^n^nl (^ ~ 2) and 

1^1^ 

/us.ui = - 5^ - 5^ (l/u^u5(u, v) + i/u^a,u5(u, -v)) . (3.13) 
^ i=\ ^ j=i 

Assume in addition that the joint density of the U-scores satisfies the weak cross-dependency 
condition: for some k = o{p) the average dependency coefficient ||Ap|';j,||i (A. 13) converges to 
zero. Then P[N°'^ > 0) ^ 1 — exp(— A) where A is the limiting value of E[N°'^] specified hy 
(3.12). 

The critical phase transition threshold for the case of cross-correlation screening can be 
derived in a similar manner to the previously considered case of auto-correlation screening. 
The critical threshold is given by 

Pe = yT^^^^^^^^^^^, (3.14) 

where c'^ — (a„ J(/uj^u6)) "^^^^ and a„ is given in (2.7). 
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3.3 Discoveries in persistent-correlation screening 

Finally we treat screening for variables whose auto-correlation exceeds a threshold in both 
of two treatments a and b. Recall that in this problem there are two correlation thresholds 
p"" and that are respectively applied to the px p sample correlation matrices derived from 
the independent data matrices X" and X**. As discussed below, Prop. 1 could be directly 
applied to this problem but it would result in an uninteresting degenerate limit. A more 
interesting result is the following. 

Proposition 3. Let the UaXp data matrix X" and the nf,xp data matrix be statistically 

independent and assume that the associated U-scores from each treatment satisfy the same 
conditions assumed for in Prop. 1. Let the sequences {pp}p and {Pp}p be such that p^^l 
andpl ^ 1 whilep''\p-l) (l - {plff'"'-'^'' ^ e„„ andp^l\p-l) (l - {plff'"'-'^'' ^ e„, 
for some finite constants e„^,e„j^. Then the mean number of discoveries N"'^'^ generated by 
the persistent-correlation screen (3.3) satisfies 



P^=l 



(3.15) 

< O (max{(^/p)^ {k/p)p-'/',p-\ \\A;J^, \\AlJ^}) , 



where = e„„e„^a„„a„j,(na - 2) ^(nt - 2) ^ and, forU e {U'', U''}, /u^.u.-i is the leave- 
one- out average of the U-score pairwise densities: 



1 " 



/ui,u,_i(u, v) = — -y J2 (i/ui,u,(u, v) + ^/u,,u,(u, -v)) . (3.16) 

Assume in addition that the U-score densities associated with X" and X* each satisfy the 
weak dependency condition stated in Prop. 1. Then Pi^N"-^^ > 0) — >■ 1 — exp(— A) where A 
is the limiting value of E[N] specified in (3.15). 

In Prop. 3 the assumed rates of convergence of p^, are slower (note the different factor 
p^/^) than the rates assumed in Prop. 1 and 2. A slower rate is required since persistent 
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correlation discoveries are rarer than auto-correlation discoveries. In particular, when the 
correlation thresholds satisfy the hypotheses of Prop. 3 the individual per-treatment means 
ElN"-] and E[N''] do not converge. However, it can be shown thatp~^/^£^[A^"] and p~^^^E[N'^] 
do converge (see Corollary 1 in Appendix/Supplemental Section). Conversely, if the indi- 
vidual per-treatment means converge to finite values then the mean number of persistent 
discoveries E[N'^^^] converges to zero, resulting in an uninteresting limit. 

Assume that one or the other of the factors in the summand of (3.15) do not depend on 

i: 

^(/u?,u;_J = ^(/u2,u;_,), or J{fjjb jjb_J ^ J{fjjb^^b_J. (3.17) 

When (3.17) holds we say that the pairwise dependencies are incoherent across treatments a 
and b. A sufficient condition for incoherence is pairwise independent U-scores with identical 
marginal densities f^. — f^. and f^. — f^.. In the incoherent case the hmit (3.15) takes 
on a simpler intuitive form 

I P 

~^'^ifVi,^i-Mifvl\jb_.) = >^(/u2,uj_.)^(/uS,uL.)- 

i=l 

Define k^, = P^''^e„„a„„/(na - 2) and = P^^^en^anJirib - 2). Then, in view of the limit 
(3.5) of Prop. 1, under the condition (3.17) the limit in (3.15) gives the large p approximation 

£[iv-] « mnm. (3.18) 

p 

The right side of (3.18) is equal to the right side of (3.15) when the pairwise dependencies 
are incoherent across treatments a and b. 

Relation (3.18) is a well known asymptotic relation for the number of matches in two 
independent BernouUi sequences of length p. In this case N"-^^ is the number of successes 
common to the pair of sequences and N°',N'^ are the number of successes in each sequence; 
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a result easily established using for large p Stirling approximations and assuming small 
probabilities of success. It is interesting that in persistency screening it is sufficient that 
only one of the two treatments produce identically distributed U-scores for (3.18) to hold. 

We next turn to the problem of selecting the thresholds and p". These thresholds 
affect the asymptotic mean number of discoveries (3.15) only through the limits e^^ and e^^ 
defined in Prop. 3 When relation (3.18) holds, it can be shown that if we fix the normalized 
average rate of per-treatment discoveries {E[N"'] + E[N^])p~^/'^ /2, ElN"-^] is maximized when 
the thresholds pa and pb are chosen to make E[N"-] = E[N'^]. These optimal thresholds are 
related by 



2 _ /-. _ / ^'^b - 2)a„„J(/u-,u2_J 

Pa V Pb) 



2/(na-2) 



(n„-2)a„,J(/u6^u6_J^ 
A general closed form expression for the critical phase transition threshold for persistent- 
correlation screening has not been found. However, for the special case of pairwise i.i.d. 
U-scores and equal number n — Ua — nj, of samples, the following expression for the critical 
threshold holds 



Pe = ^l-C^'(p-l)-2/(n-4)^ (3.19) 



/ l/2\ -2/(n— 4) 

where c«^^ = (^a„ {H2{fjja)H^{f^i)) j and a„ is given in (2.7). 

Prop 3 generalizes to more than two treatments. Assume there are m different indepen- 
dent treatments ti, . . . ,tm then the correlation thresholds pp should be selected such that 

/ J \{nt J -2)/2 

they converge to one and p^/"^(p — 1) ( 1 — (pp)^ ) ^ converges to a finite constant, say 
Cnj, j = 1, ■ ■ ■ ,m. In this case one obtains the same type of limit of the false positive rate 
as in Prop. 3 under similar conditions of weak dependence of the variables within each 
treatment. The mean number of discoveries will converge to 

p m 

hm £;[Ar*iA-At.] ^ p-i ^ Yl J{f., u ) (3.20) 

i=l j=l 
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where Kn — YlJLi y n l"2 J ■ niin{nj} > 2(m + 1) and Mi\i (defined in (A.4)) is bounded, 
the rate of convergence in (3.20) wiU be dominated by the treatment with the fewest samples 
and it will be of order O (p~2/(n-2)^ where n — mmj{nj}. Otherwise the rate of convergence 
will be 0(p"^/"*). When the factors in the summand of the limit (3.20) do not depend on i 
a relation analogous to (3.18) holds: E[Ar*iA-Atm] ~ (^[iv*i] • • • E[iV*'"])/p™-^ 



4 Correlation screening with sparse dependency 

In this section we specialize to the class of g-sparsc p x p covariances, defined as row-sparse 
covariance matrices of degree q that can be reduced to a single q x q block of correlated 
variables using row-colum permutations. Under this g-sparse condition, to order O {{q/pY) 
the limits stated in Propositions 1-3 do not depend on the unknown joint sample distribution. 
Therefore, these propositions can be used to determine universal screening thresholds that 
approximately control any desired level of false positive rate. We treat each of the three 
correlation screening procedures separately. 

4.1 Sparse auto-correlation screening 

Let the rows of X be i.i.d. Under the assumption that the columns of X have g'-sparse 
covariance, the U-scores {Uf}f^^ are i.i.d. uniform except for a number of g < p mutu- 
ally dependent U-scores {U"}^^^ that are independent of the rest. The mean number of 
discoveries in Prop. 1 becomes, to order at most O (max , 

sm = <(i + f|^(-'(7or5r:)-i)). 

where ff-,a fia is the average over the joint distributions of distinct and mutually dependent 
U-scores. Therefore, to order at most O (max {(g/p)^,p~^,p~^/^"~^)}) the mean number of 
discoveries is equal to k^. 
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4.2 Sparse cross-correlation screening 

Let the rows of X be i.i.d. Assume that the cross correlation matrix is block-sparse in the 
sense that there exists a column permutation that puts the cross-correlation matrix into a 
form having most entries zero except for a small Qa x qi, non-zero off diagonal block. Then the 
mean number of discoveries in Prop. 2 becomes, to order at most O (max 



£;M = <''(i + ^(j(7-^)-i)) 



Therefore, with q = maxj^a, g;,}, to order O (max {(?/p)^,p ^^'^"^ ^•*}) the mean number 
of discoveries is equal to k*^. 

4.3 Sparse persistent-correlation screening 

Let the rows of X be i.i.d. Assume that under treatment a all variables are mutually uncorre- 
lated except for a those variables with indices in the set Qa- Likewise define the index set Qi, 
of variables having non-zero correlation under treatment h. The mean number of discoveries 
in Prop. 3 becomes, to order O (max 



+ 1 ^) (^) A - 1) . (^) i^4^ 1 - 1) 



P J \ P ~ ^ J \ P ~ ^ 

where = J{f\ja fja ) and similarly for J^. In particular, to order O (max }) , 

if there is a ^-sparse covariance under each treatment and there are common persistent cor- 
relations among the q variables the 



1) 

p{p - 1) 
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while if only one of the treatments, say treatment a, produces gr-sparse covariance 



(1 + 



( 



q{q - 1) 
P{P - 1) 




In particular, in the latter case to order O (max ^^^,p '^^^^ ^^}) the simple product 

representation (3.18) holds. 



5 Numerical experiments 



To illustrate the practical utility of the theory developed in the previous sections we present 
two numerical studies. First simulations were performed that show our false positive rate 
approximations give accurate finite p approximations to empirically determined error rates 
in a sparse example. Second, these approximations are used to perform correlation screening 
on experimental gene expression microarray data. 

5.1 Simulation results 

We used the asymptotic theory to specify suitable correlation thresholds that ensure specified 
familywise error rates (FWER): false positives (Type I) and false negatives (Type II). We 
simulated a problem of persistent correlation screening over a pair of treatments for the 
presence of a few and strongly correlated variables in a nearly diagonal covariance matrix. 
The two treatments were balanced — rib, the rows of X were i.i.d. multivariate normal 
and the covariance matrix was diagonal except for a 2 x 2 block corresponding to a pair of 
correlated variables. 

For given p and ria, rib, the approximation to P(A^"^^ > 0) given in Prop. 3 was used to 
select thresholds Pp,Pp that guarantee a Type I FWER of level a. Once this threshold was 
determined, the Type II FWER was approximated using a bias corrected normal approxi- 
mation to the Fisher- Z transformation of the non-zero correlations: Zij — ^log jzf^' foi" ^ 
the number of samples Zij is approximately normally distributed with mean and variance 
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n\a 


0.010 


0.025 


0.050 


0.075 


0.100 


10 


0.98\0.96 


0.98\0.96 


0.98\0.95 


0.98\0.95 


0.98\0.95 


15 


0.94\0.89 


0.94\0.88 


0.93\0.87 


0.93\0.87 


0.93\0.87 


20 


0.89\0.82 


0.89\0.81 


0.88\0.80 


0.88\0.80 


0.88\0.79 


25 


0.85\0.76 


0.84\0.75 


0.84\0.74 


0.83\0.74 


0.83\0.73 


30 


0.81\0.72 


0.80\0.70 


0.79\0.70 


0.79\0.69 


0.79\0.69 


35 


0.77\0.67 


0.76\0.66 


0.76\0.65 


0.75\0.65 


0.75\0.64 



Table 2: Minimum detectable correlation and Icvcl-o; threshold (given as entry pi\p in 
table) for persistent correlation screening as a function of number of samples n (rows) and 
familywisc false positive level a (columns) for p = 500 and /3 = 0.8. The number of samples 
in each treatment is identical { n = Ua = Ub). The false positive rate approximation in Prop. 
3 was used to determine the required level-a threshold p. With this value of p the minimum 
detectable correlation pi was determined using a bias corrected normal approximation to the 
Fisher- Z transformation of the sample correlation. 

[1] 

E[Z,j] = i log + pij/{2{n - 1)), var(%) = (n - 3)"^ 

-L Pij 

These approximations to Type I and Type II error rates were combined to produce Table 
2. This table illustrates how one can use the theory to predict the required sample sizes and 
the required threshold to achieve a desired false positive rate a. The minimal detectable 
correlation is defined using the aforementioned theoretical FWER approximations as the 
minimum value of the true correlation for which the presence of a persistent correlation is 
detected with probability at least /3 and false alarm probability a. For example, with the 
p — 500 variables assumed in generating the table, at least n = 35 samples are required for 
reliable detection of a persistent magnitude correlation less than or equal to pi — 0.77 at the 
prescribed {a, (3) = (0.01,0.8) false positive and true positive levels. 

Next we assess the fidelity of the familywise error predictions in Table 2 by comparing 
them to empirical error rates determined by simulation. To obtain the empirical values a 
set of tables like Table 2 was generated for each targeted value of (3, ranging from 0.6 to 
0.9, and the obtained predicted threshold value p was used to screen the sample correlation 
matrix. We simulated 4000 replicates to construct relative frequencies of empirical false 
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Theoretical vs empirical performance guarantees 
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Figure 4: Comparison between predicted (diamonds) and actual (integers) operating points 
{a, P) for persistent correlation screening thresholds determined by the theory used to gen- 
erate Table 2. Each integer is located near an operating point and indexes the sample size 
n over the six values n = 10, 15, 20, 25, 30, 35. These numbers are color coded according to 
the target value of (3. 

positive rates a and empirical true positive 0) rates for the same parameters p, n as were 
used to generate the analytical predictions in the tables. Figure 4 shows the predicted (a, (3) 
operating points (diamonds) and actual {a, (3) operating points (integers), determined by 
simulation for different values of n. Figure 4 demonstrates that our asymptotic predictions 
are accurate for relatively large values of a, small values of n, and finite p. 

5.2 Experimental results 

We applied the correlation screening theory to a dataset downloaded from the public Gene 
Expression Omnibus (GEO) NCBI web site [5]. This data was collected and analyzed by 
the authors of [4]. The dataset consists of 108 Affymetrix HU133 Genechips containing 
p = 22, 283 gene probes hybridized from peripheral blood samples taken from 6 individuals 
at 5 time points (0,1,2, 4 and 12 hours) on four independent days under m = 4 treatments: 
intake of alcohol, grape juice, water, or red wine. According to the GEO Summary of 
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Figure 5: 3-dimensional projections of the U-scores for the experimental beverage data under 
each of the treatments 1,2,3,4. For visuahzation the 22,238 variables (gene probes) were 
downsampled by a factor of 8 and a randomly selected set of four samples in each treatment 
were used to produce these figures. 

the author's analysis of this data: "Results may contribute to elucidating the mechanisms 
underlying the cardioprotective effects of red wine." 

After removing samples taken at pretreatment baseline (time 0) there remained n = 87 
samples distributed over the treatments as: rii = 20 (alcohol), n2 = 22 (grape juice), = 23 
(water), and = 22 (wine). Figure 5 gives a visualization of the U-scores for each treatment. 
Observe that the U-scores display non- uniformity on the sphere 5*2. We applied correlation 
screening to the data as follows. As the numbers of samples differ in each treatment we 
constrained the screening thresholds to equalize the four per-treatment auto-screening error 
rates, as explained in Sec. 3. 

There are 2^ — 1 possible auto-screening and persistency-screening analysis combinations 
that can be performed over the 4 treatments {1,2,3,4}. Using our approximation to false 
positive rate we fixed Type I FWER at level 10~^ and determined the 4 auto-screening 
thresholds and the 11 sets of persistency screening thresholds. Correlation screening was 
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{1},{2},{3},{4} 




51 


52 


96 


518 




{1,2},{1,3},{1,4},{2,3},{2,4},{3,4} 


493 


748 


1069 


677 


864 


1445 


{2,3,4},{1,3,4},{1,2,4},{1,2,3} 




2242 


2530 


1893 


1690 




{1,2,3,4} 






3313 









Table 3: Number of genes discovered by auto-screening (top row) and persistency screening 
(lower three rows) for various combinations of treatments in the experimental data. Auto- 
screening threshold determined using our approximation to Type I error of level 10~^. 

performed on the sample correlation matrix of all 22,238 gene probes. These thresholds 
resulted in 15 different sets of discoveries in relative numbers shown in Table 3. 

To explore the relations between the different sets of genes discovered in each screen 
we plot a directed set-inclusion graph in Fig. 6. The sizes of the 15 nodes correspond to 
the length of the hst of discovered genes at FWER 10~^ under the persistency screening 
combination that is indicated by the node label. The nodes are arranged in 3 concentric 
rings with an inner ring corresponding to higher degree of persistency (persistency over more 
treatments) than an outer ring. Edges are shown only between nodes for which at least 90% 
of the genes in one node is a subset of the other node and thickest edges correspond to 100% 
set inclusion. There are no edges between different auto-correlation screens (nodes labeled 
1,2,3,4). Note also the preponderance of directed edges with arrows pointing from outer rings 
towards inner rings as as contrasted with edges between nodes on the same ring or pointing 
to outer rings. As compared to the other three treatments, treatment 2 (water) generates a 
lower proportion of auto-correlation screening genes that are also persistent genes. 

In Figure 7 we show a 774 node subnetwork of the correlation network corresponding to 
the 3313 discoveries of genes whose correlation persists over all four treatments. Two genes 
in this subnetwork are connected by an edge only if the sample correlation between them 
persists over all four treatments. Thus, as contrasted to the original 3313 node network of 
genes having any correlation that persists over treatments (persistent nodes). Fig. 7 shows 
the subnetwork of genes whose mutual correlations persist (persistent edges). Observe the 
presence of a giant component of 516 genes shown in the figure as the central connected 



28 



Figure 6: Set-inclusion graph between genes discovered by correlation screening in various 
combinations of treatments. Size of node is proportional to the log of number of associated 
correlation screening discoveries given in Table 3. A directed edge from node i to node j 
exists if at least 90% of the genes discovered in node i are also discovered in node j and 
the thickest edges indicate 100% set inclusion. The asymmetry of diagram indicates that 
treatments have different effects on gene expression. The paucity of edges to and from grape 
juice ("2") and wine ("4") indicates that most of the genes discovered in auto-screening are 
not persistent across treatments. 



29 



Figure 7: 774 gene subnetwork of the 3313 gene persistent-correlation network across all 
four treatments corresponding to the last row of Table 3. Two nodes in this network are 
linked by an edge if for all 4 treatments their sample correlation is above the 10~^ FWER 
correlation-screening threshold. 

component. 

6 Conclusions 

We have presented theory that yields asymptotic approximations for large scale correlation 
screening within a single treatment and across multiple treatments. We obtained expres- 
sions for the mean number of discoveries that depend on Bhattacharyya divergences [3]. 
Expressions for phase transition thresholds were established. The theory applies to large 
scale screening of sample correlation when the true correlation is sparse or approximately 
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sparse. Put another way, the theory apphes to screening for star motifs in a sparse graph 
associated with a thresholded sample correlation matrix. This theory can be extended to 
screening more general correlation motifs, e.g. triangles, chains, and higher order transitive 
correlations. It can also be extended to screening sparse partial correlation matrices. 

Supplemental Materials 

Proofs of propositions, lemmas and corollary Proofs of Proposition 1, 2 and 3; defi- 
nitions for proofs, a fundamental lemma, and a corollary; 
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Supplemental Materials 

A Proofs of Propositions 

A.l Definitions and fundamental lemma 

Here we give the principal definitions used in this Appendix/Supplemental Section. 
Definitions: In the paper we defined averaged densities of one or two variables such as 
/u. ; /uj,uS /uj.u'' • For averages over more than two indices, required in the proofs 
developed below, we introduce the following notation for k-iold averaging. For fixed integer 
i define 

avgi^^...^i Jui,,...,Ui^,Ui(ui, . . . , Ufe, v) = {p{p -l)---{p-k + 1))-^ 

X J2 /uii,...,Ui^,u,(ui,...,Ufe,v), (^•^) 

il,...,ik 



S-1 



and similarly for avgj^_^..._^j^/ui^,...,Uij^,|Ur When all of the variables Uj^ are from the same 
treatment, the indices ii, . . . , run over the range 1, . . . ,p and exclude the index i. When 
there are two treatments, as in avgj^_^..._^j^/u6 jj6 |ua, the indices ii,...,ik run over the 
same range but include i. 



Thus we have, for example, avgij{fjja jjt} = /uj,uS and avg^^^{/ui,uj = /u.,u._.- 
When there is no risk of confusion, we will write the averaging operator avgj^_ instead of 

Define the least upper bound M^i on any A;-th order conditional U-score density 

/Ui^,..,UiJUfe+i(Ui, . . . , Ufe|Ufe+i) 



Mk\i 



max 

hj^—T^ik+i 



U,,,...,U,JUfc+i 



(A.2) 



where for any function 5'(ui, . . . , Ufc) of Uj G Sn-2, i = 1, ■ ■ ■ , k, \\g\\oo denotes the sup norm 



IS'lloo = sup 

Ul,...,UfceS'n_2X---xS„ 



|^(Ui,...,Ufe) 



Similarly define M^^ as: 



Mfe|2 = max 



Ujj ,...,Uij^. |Ui^_|_^ ,Uii^._|_2 



(A.3) 



Define the maximal gradient of the average pairwise density 



Mi|i = max sup 11 Vu/ui|u,(u|v)| 

'^3 veSn-2 



■-v\\2 



(A.4) 



where Vu = [d/dui, . . . , d/dun~i]^ is the gradient operator. 

For two treatments a, b wc define the above quantities analogously except that the single 
treatment U-score distribution is replaced by the two treatment distribution /ua ^b. For 
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example M^^i and Mp become 



max 



/i 



U? ,...,U'? lu? 



(A.5) 



and 



M{ii = max sup 



Vu/u"|u'!(u|v) 



(A.6) 



Weeik dependency coefficients 

For a single treatment, let Si denote the degree of node Xi in the population correlation 
graph over X = [Xi, . . . , Xp]. For given integer k, < k < p, define 



min(fe,i5i) 



(A.7) 



When k > 5i these are indices of the nearest neighbors of X^ amongst {^jj^yi- When k < Si 
these are the A;- nearest neighbors (/c-NN) of Xi. For a pair of U-scores Uj,Uj define the 
p — 2 — k "complementary k nearest neighbors" UAk{i,j) — {^i '■ ^ ^ -^kihj)} where 



(A.8) 



with denoting set complement of A. The complementary /c-NN's from Uj, include 
all scores outside of their respective /c-nearest-neighbor regions. For i ^ j the dependency 
coefficient between Uj, Uj and their complementary /c-NN's is defined as 



(A.9) 



For two treatments a, 6 let ^",5^^ be the degrees of vertices respectively, in the 

population cross-correlation graph having an edge between Xf and X^, when pfj* ^ 0. Simi- 
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larly to (A. 7) define the indices of the /c- nearest neighbors of among 

min(A;,<5") 

AAf (i) = argmax,.,^...^,^^^^^^^„^ E 1^1 (A-10) 



and similarly define A/"^ {i) by replacing 5f with 6\ and with p\'-^. In analogy to (A. 8) 
define 

j) = {(Z,m) : Z e {Nf{j)y - {0,m e - {j}} . (A.ll) 

For a pair of U-scores U", the complementary /c-NN's in treatments a and b are {Uf, U^}(; „j^g^a6(jj 
The cross-dependency coefficient between U", and the complementary A;-NN's is defined 



1 1 — J 

as 



(A.12) 



Finally, let the average U-score weak dependency and weak cross-dependency coefficients 
be given by arithmetic averages 

l|Ap,.||i = l)/2)-^^ A,,,(i,j), ||A^:',||i Kkihj)- (A.13) 

i<j i,j=l 

The average weak dependency coefficients (A.13) are a natural measure of sparsity and 
weak dependence. For example, assume that there is no vertex of degree greater than k in 
the population correlation graph associated with X, that X has an elliptical distribution 
and that the rows of the data matrix X are i.i.d. Then ||Ap^fc||i = 0. Similarly, if the rows 
of [X",X''] are i.i.d. elliptically distributed and no node in the population cross-correlation 
graph has vertex degree exceeding k then ||Ap|'j.||i = 0. 



S-4 



A. 2 Proofs of Propositions 

The proofs of Props. 1-3 will use several fundamental results gathered in the following 
lemma. 

Lemma 1. Let Xp be anxp data matrix and let {Uij^^i be the U-scores extracted from the 
columns ofXp. Assume that the joint U-score density is bounded. Define (f)ij the indicator 
function of the event \rij\ > p where Vij = UfUj is the sample- correlation coefficient and 
< p < 1. Then for any ii, . . . ,ik e {1, . . .,p}, ii ^ ■ ■ ■ ^ ik ^ i, k e {1, . . . ,p - 1}, 



E 



= / dv dui- - dUfe /u,^,...,Ui,,Ui(ui,--- ,Ufc,v) (A.14) 

Js„-2 JA{r,w) JA{r,v) 

< Po'a'nMkH (A.15) 



with Pq = Po{p,n) defined in (2.6), a„ = |<S'ji_2|, and Mk\i defined in (A. 2). In (A.I4) 
A{r, v) = C(r, v) U C(r, — v) is the union of spherical cap regions on Sn-2 centered at v and 
—V with radius r — •\/2(r^^p) . 



Furthermore, defining 9i = {p — 1) ^ X^^,=i 



j=i ri3- 



|£;[^,]-PoJ(/u„u._.)| < 2a„PorMi|i, (A.16) 
When (p — l)Po ^ 1 have the following inequality 

\E{<^^ -{p- l)E[ei] I < 7p {{P - l)Po)' . (A.17) 

and, for i ^ j, 

|£;[0,,]-PoJ(/u„u,)| < 2a„PorMi|i, (A.18) 



E[(t)ij\ < a„PoMi|i, 
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(A.19) 



and for i ^ j ^ k ^ I 



E[(t>ij(t>jk] < <PoM2|i, E[(t)i^(t)ki] < <n^2i2. (A.20) 



Proof of Lemma 1 

Fix p. Without any loss we can assume that the indices have been reindexed so that 
i = p. The representation (A. 14) follows directly from the fact that (f)ij is the indicator of 
\Jj e A(r, Uj); the event that the magnitude sample correlation between the i-th and j-th 
variable exceeds p, j ^ i. Application of the mean value theorem to the inner integral in 
(A. 14), and noting that |A(r, v)| = (inPo, with a„ = |iS'„_2|, yields the inequality (A. 15). 

We next establish (A. 16) and (A. 17). Using the definition of 9i and the integral relation 
(A.14) for E[(l)ij] 

E[e,] = \A{r,v)\ [ dv (iA;-^(v,v) + lA;-^(-v,v))+(5, (A.21) 

Sn-2 

where 5i is a residual that has magnitude upper bounded by 2rMi|i. To show relation 
(A. 17) start with the representation 0j — maxj^j (pij or, equivalently, 0j = 1 — Hj^ill ~ 4>ij)- 
Expansion of the product yields the p-term series expression 



m] = E mj] + E mn4>i,, ]+...+e 



n "^^i 



(A.22) 



where the indices in the summations and the product are indexing over the ranges 1, . . . ,p. 
There are (^~^) summands in the k-th term on the right of (A.22) and, by (A. 15), each of 
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these summands is bounded by P^a^Mk^i. Therefore, using the definition of 9i 

< max{a^M,|a|^(^^^^)Po'= (A.23) 

Under the assumption (p — l)Po < 1 the sum on the right hand side is bounded by {{p — 
l)Po)^{e — 3/2), which estabhshes (A. 17). This latter bound follows from the elementary 
inequalities 



k=2 V / V / k=2 



Relations (A. 18) and (A. 19) are simply recapitulations of (A. 15) and (A. 20) is established 
analogously. This finishes the proof of Lemma 1. □ 

A.3 Proof of Prop. 1 

We divide the proof into two pieces, the first dealing with the mean number of discoveries 
(3.5) and the second with the Poisson limit. Both parts use the following direct consequence 
of the expression (2.8) 

p{p - l)Po = P{P -l){n- 2)-V(l - pJ)("-'^/'(l + 0(1 - pD), 
so that, as p — )■ oo, 

p{p - l)Po ^ e„a„/(n - 2), (A.25) 

where is the constant in the rate of convergence of pp — > 1 that was assumed in Prop. 1. 
Furthermore, as p{p — 1)Pq converges, (p — l)Po converges to zero. 
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By (A. 17) of Lemma 1, when {p — l)Po < 1 the number of discoveries N — Yl^=i 4>i has 
mean that satisfies 



E[N]-{p-l)J2E[e, 



i=l 



<p-'{p{p-l)Pof^p. 



(A.26) 



Therefore, E[N] converges to {p — 1) Yl^=i -^[^j] '^i^^ rate at least 0{p 



Next consider the difference (p- 1) ELi ^i^il l)-Po^(/u.,u._.)- As J(/u.,u,_.) = 



Z]r=i '^(/ui,u._J, averaging over i the relation (A. 21), used to show (A. 16) of Lemma 1, 
provides the bound 



ip-l)J2 ^i^i] - PiP - 1)^0 J(/u.,u._.) 



i=l 



<rp(p-l)Po(2a„Mi|i), 



(A.27) 



where r = \/2{l — p). Combining (A.26) and (A.27) yields 



\E[N] - {p{p - l)Po) J(/u.,u._.)| < p-' {p{p - l)Poflp + tMp - ^)Po)Vp, (A.28) 



where rjp — 2a„Mi|i. As p{p — l)Po converges to e„a„/ {n — 2) and converges to zero, E[N] 
converges to the stated limit. When M^i = 0(1) and n> A the term involving dominates 



and the bound is of order 0(->/ (1 — pp)) — 0(p~^/("~^)). This completes the first part of the 
proof. 

We next show the stated limit P(A^ > 0) — )• 1 — exp(— A). Let (pij be the indicator of the 
event \rij\ > Pp as defined in Lemma 1. Then A^e = Si>j ^ij ~ h Si^j ^ij number of 

edges in the thresholded empirical correlation graph and N — Yli=i T^^^j-.j^i (f>ij is the number 
of vertices of positive degree. Since = if and only if Ng = 0: P(A^ > 0) = P(A'"e > 0). 
Thus the stated limit will follow from: (1) convergence of the distribution of Ag to a Poisson 
law with rate A = E[Ng]; (2) convergence of A to one half of the right hand side of (3.5). 
Assertion (2) follows from (A. 18) and the obvious identity E[Ne] = ayg^^jE[4>ij]p{p — l)/2. 
It remains to show (1). 
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Define the sets of index pairs C — : 1 < i < J < p} and = {{l-ifn) : I e 

Mk{i),m e J^kU)} n C. Observe that \Bij\ < k{k — l)/2. Let N* be a Poisson random 
variable with rate A = E[Ne]. With these definitions the Chen-Stein theorem [2, Thm. 1] 
provides a bound on the total variation distance between the distribution of A^e and that of 
N*: 

max\P(Ne eA)- P(N* eA)\<bi + b2 + h (A.29) 

A 

where 

bi= Yl Yl E[<Pi,]E[<Pim] 
{i,j)ec {i,m)eBij-{{i,j)} 

and, for = 

bs= Y E [E[(i)ij - Pij\{(i)i^ : m) ^ B^j U {^, j)}}]] ■ 
Applying the bound (A. 19) to the summand of hi we obtain 

bi < ^-^^^^^^maxE^[0.,] < 0(/fc^Po^) = O {{k/pf) , 

Z Z i<3 

as p{p — l)Po — 0{1). Likewise, the bound (A. 20) applied to 62 gives 

h. < E^^^^^^^^ < p'k'P^Mal = O {{k/pf) , 

Z Z {i,j)^{l,m) 

where M — max{M2|i, M2|2}. 
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Furthermore, with Ak{i,j) the index set defined in (A. 8), 



E [E[(Pij - pij\{(Pim : {l,m) ^ Bij U = E [^[0^^- - pij|UA,(ij)]] 



^5 



^3 



/Ui,U,(Ui,Uj) 

< a„PoAp,fe(i,j). 



Hence, as &i + 62 + &3 = O (max{(/c/p)^, ||Ap_fc||i) and k — o{p), (A. 29) estabhshes that Nf, 
converges in distribution to a Poisson random variable. □ 
The rate of convergence of E[N] to (p — 1) Yl\=i E[9i\ specified by (A. 17) is 0{p~^), while. 



when n > 4, its rate of convergence to p(p — l)-Po<^(/u,,u,_,) is dominated by the slower rate 
0(p-2/("-2)). In the case that the rows of X are i.i.d. elliptically distributed with row-sparse 
covariance matrix S of degree the rate of convergence of the probability P{N > 0) to 
1 — exp(— A) is at worst O (max{(A;/p)^}). 

A. 4 Proof of Prop. 2 

The technical details for the proof of Prop. 2 are similar to those of the proof of Prop. 1. 
The main difference is that a discovery (A^"'' > 0) occurs when a U-score from treatment 
h is in the r neighborhood A{r^ Uf ) of U-score U" from treatment a. Therefore, as contrasted 
to the auto-screening case, there arc p possible 6-treatmcnt U-scorcs that can fall into the 
neighborhood of U" instead of the p — 1 the remaining a-treatment U-scores considered in 
auto-correlation screening. Due to this difference, the factor p — 1 is replaced by p in all 
bounds and representations and the indexing is no longer restricted to distinct indices in 
{VDi and 

The stated hmiting expression for P{N"'^ > 0) is established by applying the Chen-Stein 
theorem [2, Thm. 1] to the number of edges Ng = '^i^j (pij the thresholded empirical cross- 
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correlation graph, where 0"]" is the indicator of the event jr^j*] > Pp. It is easily estabhshed 
that E[Nf] = EIN"'']. Define the sets C = : i,j = l,...,p} and Bf^j = {{l,m) : 

/ e Aff{j),m e J^k^^ii)}, where ^^'"(i) is defined in (A.IO). Observe that \Bij\ < 
and that the scores {U",!!^};,^ such that (/,m) ^ U {(^,i)} is the precisely the set 
{^i,Vm}{i,m)eAf{i,j) where Al^{i,j) is given by (A. 11). In analogous manner to the proof 
of Prop. 1 the three terms 6i, 62 and 63 in (A. 29) can be bounded by bi < p'^k'^PQa'^{M^0'^, 
62 < p2A;2p2a2 j^ax(M«y,M«y and h < p'^PoaJA^^^\\i where ||A^y|i is given by (A.12). 
Therefore 61 + 62 + ^3 < 0(max{(A;/p)^, || Apl*^!!!)}) and we conclude that if A; = o{p) and 
ll^pffclli converges to zero then A^"'' converges to a Poisson random variable. Furthermore, 
from (A.18) it is easily verified that E[N''''] = E[Nf]. Thus, as A^"^^ = if and only if 
jyaAb ^ p(^j^ab > Q) = p[N^b > Q) = 1 - exp(-A"'') with A = ^[A^"'']. □ 

A. 5 Proof of Prop. 3 

To simplify notation we define Po,a = -Po(Pp)^a)) Eo,b = Eo{Pp, iT'b) ■ Similarly to the proof of 
Prop. 1, a direct consequence of the expression (2.8) is that for any a e [0, 1]: — 
l)Po°a-Po^6 " is convergent and therefore {p — 1)Pq°„Pq^^" converges to zero. 

As in Lemma 1 define 0" = maxjyj (p^^ the indicator function of the event that in treat- 
ment a there is some variable j ^ i whose sample correlation with the i-th variable exceeds 
Pp. Similarly define 0^. The total number of persistent discoveries is A""^^ = Yl^=i 'Pi't'i ^^'^^ 
since the treatments are independent, E[N"'^''] = Yl^^i E[4>'^]E[4>'^]. Define the independent 
random variables Of = (p - 1)"^ 0fj and 9^^ = (p - 1)-^ Y7j=i 

Consider the difference 

m]Em - (p - ^rm]Em = im] - (p - momEm - (p - mo^i) 

+{p - i)E[et]{E[<p^ - (p - me',]) + {p- i)E[e'^]{Em -{p- i)e[9 

Sum over i and apply inequalities (A. 17) and (A. 15) of Lemma 1 to obtain, for p large enough 
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to make (p - l)Po,a < 1 and (p - l)Po,6 < 1, 



1=1 



< 



- + - 1)«)^ 

(A.30) 



where 7^, 7^ are defined as 7^ in Lemma 1 using Mk\i = M^|^ and M^ii = Mj^^^, respectively, 
and rjp = UnM^^'y^, rjp = anM^^'y^. As the right hand side of the above equation is 
this estabhshes that 



i=l 



By (A. 16) of Lemma 1 this hmit is equal to (3.16). 

It remains to establish the stated limit of the probability P[N"'^^ > 0). Similar to the 
proof of Prop. 1, let (f)^j and 0^^- be indicators of the events |r? | > Pp, \r^j\ > Pp, respectively. 
Then (/.r^ ^ max,:,^,,,^, c^^A ^""^^ = ELi 0f Let AT,.,, = ^ti E,:,y. T.l,^^ = 
SiLi ^^"(ii where d\ and denote the degrees of vertex i in the respective thresholded em- 
pirical correlation graphs associated with each treatment. We will show that N^a^h is asymp- 
totically Poisson distributed with rate A = £;[A^«^^]. Since A^"^^ = if and only if N^as> = 
this will establish the stated limiting expression for P[N°-^^ > 0). 

First we establish that E\N^a^h\ converges to the same hmit as does E^fA""^**]. Since the 
treatments are independent 

m^^A = E E ^['^".] E ^y^'^- 

Invoking (A. 18) from Lemma 1, 

- Pa,oJ (/u^U|) I < 2anPaflr;,K\l: 
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where = ^2(1 - and likewise for . Therefore, from (A.31), 

E[Na.aA = pip - lfPaflP,fl (^-' J2 J (/u?,uj_,) J (/ut,ui_.) + j 

where 0{rp) ^ as 1. Therefore E[N^a^b] converges to the hmit on the right side 

of (3.16). 

Define C — {(Z,m, n) : 1 < l,m,n < PjUi ^ l,n ^ I)}. For given integer k, define the 
index set 

B,j,i = {{I, m,n):le N^{i) U Nl{i),m e A4"(j), n e Nl{l)} D C, 

where is the fc-neighborhood defined in (A. 7) with {Uj} replaced by {U"}, and Mlii) 

is similarly defined. The cardinality of Bij^i is bounded by k^. Letting N* be Poisson with 
rate E[N^adh\, apphcation of the Chen-Stein theorem [2, Thm. 1] yields 

max |P(^"dadi. e A) - P{N* e A)\<hi + h2 + (A.32) 

(«l,i2,«3)6C (jlJ2,j3)6-Bii,i2,<3-{(w>«2,«3)} 

63 = E ^ [Cl^a -^^tl^al {<^.i,.2j3 : 0-i,i2,i3) ^ 5n,^2.3 U ^2, ^3)}}] ] , 

(ji,i2,i3)eC' 

with Pn,i„i, = E[<Pl%^]. 

Next (A. 19) and (A. 20) are applied to bound bi and 62. For ^1,^2,^3 G C* 

^[^,^3] = ^[C^2]^[</'li3] < an^an,Pa,oP,,oM^iMiv 

S-13 



We conclude that 

where 70 = {M^^Mt i^naOn^) . Bounding 62 requires more care. Start from 



^31,33^'' 



The symmetry relation = can cause three types of reductions in the above expression 
over the range of indices of summation ii,i2,'>'3, ji, j2, js in (A. 33). The first reduction is 
^[Ci2<^jij2] = E[<Pii,i2l which occurs when ii = ja, ^2 = Ji, and the second is £^[0-i,i3 05ij3] = 
-^[^ii 13]' which occurs when ii — ja, ia = ji. The third reduction occurs when both of these 
two reductions occur simultaneously, which is possible if and only if ^2 = H and J2 = is- 
These reductions affect the order of the summand in Pafi and Pb^. For ii ^ 12, ji ^ J2, 



an,Pa,oMf|^, ii = j2, 12 = Jl 

|1' ^^-'2|2 



«l^a,oniax{M2".,M2"|J, o.w. 



and similarly for -E'[0ii,i30jij3]- Hence 



= O {{k/pf) + O {p-'^\k/p)) + O (p-i) 

where 7j's are constants depending on M{j^, M^^, M212 and M^'i-^, M^^^, M^\2- We conclude that 
bi and 62 converge to zero at rates no worse than O ((/c/p)^) and O (max{(A;/p)^, {k/p)p~^^'^,p~^ 
respectively. 

Finally we deal with the term 63 in (A. 32). Define Al^^{ii, 12, i^) — {Bi^^i^^i^\j{{ii, 12, is))*^. 
Using the definition ^"j)'' = 'i^'ij4>\i and the statistical independence of </)"j and (j)\i the summand 



S-14 



of 63 takes the form: 



+pUE[E[<1>U. -Pn,JU^^(n..)]] +PUME[<1>Us - PUsl^kin^s)]^^-^"^) 

where p"^ ^^ = E[(f)f^^-J and 12) is as defined in (A. 8) for X = X" the variables in treat- 

ment a. Analogous definitions hold for ^.^ and A^(ii,i3). Bounds on the two conditional 
expectations the right of (A. 34) were obtained in the proof of Prop. 1. Using these results 
in (A. 34) and summing over (^1,^2,^3) £ C yields 

\h\<p'Pa,oP,,oan.aJ\A;j2\\Alj2+p'{P^^^^^ 

or 63 < O (max{|| Ap || j^.||i}) . Since k — o{p) and the dependency coefficients A^ ^, A^ j^. 
converge to zero, we conclude that 61 + 62 + ^'3 converge to zero and therefore N^a^b converges 
in distribution to a Poisson random variable. This completes the proof of Prop. 3. □ 

Corollary 1. Under the assumptions of Prop. 3 the individual treatment means p~^^^E[N"-] 
and p~^^'^E[N^] converge to their respective limits specified in Prop. 1. 

Proof. Under the stated conditions in Prop. 3 on the sequences and p^, p^^'^{p — 
l)Po{Pp, Ua) andp^/^(p— l)Po(Pp, ni,) converge to constants. Furthermore, from the inequality 
(A.28) established in proving Prop. 1 (with N = N", N^) 

\E[N]/^-^{p-l)PoJ{h~^)\ 

< (iv iVpip - 1)^0)' /Vp + VpiVpip - 1)^0)^2(1 - pp) 

(A.35) 

and thus E[N"-]/y/p and E[N'']/y/p are convergent. This establishes Corollary 1. □ 
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We comment on the convergence rates in the three Propositions. The dominant distribu- 
tional convergence rates are identical if the row-sparse covariance parameter k is fixed but 
they difi^er if k increases in p. Assume that the rows of X are i.i.d. and ellipically distributed 
with a covariance matrix S that is row-sparse of degree-k with k — o{p). Then for each of 
the auto-screening, cross-screening and persistent-screening cases P{N > 0) converges to a 
Poisson probability of the form 1 — exp(— A) at speed no worse than 0{p~^) if k is constant. 
On the other hand the speed can be at the slower rates O {{k/pY) for auto- and cross- cor- 
relation screening and O {{k/p)^) for persistent correlation screening if k increases rapidly 
with p. On the other hand the mean number of discoveries may converge to the stated limits 
at slower rates. For example, the mean number of auto-correlation discoveries converges at 
rate not exceeding 0(max{p^\p^^/("^^)}) while the mean number of persistent discoveries 
converges at rate not exceeding 0(max{p~^/^,p~^/("~^^}), where n is the minimum of n^, ri},. 
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